Smoothed Action Value Functions
ثبت نشده
چکیده
State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. Moreover, the gradients of expected reward with respect to the mean and covariance of a parameterized Gaussian policy can be recovered from the gradient and Hessian of the smoothed Q-value function. Based on these relationships we develop new algorithms for training a Gaussian policy directly from a learned smoothed Q-value approximator. Our approach is amenable to proximal optimization techniques by augmenting the objective with a penalty on KLdivergence from a previous policy. We find that the ability to learn both a mean and covariance during training allows this approach to achieve much better results on standard continuous control benchmarks.
منابع مشابه
Smoothed Action Value Functions
State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment....
متن کاملSmoothed Action Value Functions for Learning Gaussian Policies
State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Qlearning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. ...
متن کاملSmoothed Analysis of Trie Height
Tries are very simple general purpose data structures for information retrieval. A crucial parameter of a trie is its height. In the worst case the height is unbounded when the trie is built over a set of n strings. Analytical investigations have shown that the average height under many random sources is logarithmic in n. Experimental studies of trie height suggest that this holds for non-rando...
متن کاملNonsmooth Matrix Valued Functions Defined by Singular Values
A class of matrix valued functions defined by singular values of nonsymmetric matrices are shown to have many properties analogous to matrix valued functions defined by eigenvalues of symmetric matrices. In particular, the (smoothed) matrix valued Fischer-Burmeister function is proved to be strongly semismooth everywhere. This result is also used to show the strong semismoothness of the (smooth...
متن کاملValue Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs
Recently, Petrik et al. demonstrated that L1Regularized Approximate Linear Programming (RALP) could produce value functions and policies which compared favorably to established linear value function approximation techniques like LSPI. RALP’s success primarily stems from the ability to solve the feature selection and value function approximation steps simultaneously. RALP’s performance guarantee...
متن کامل